neural architecture
Learning to Flow from Generative Pretext Tasks for Neural Architecture Encoding
The performance of a deep learning model on a specific task and dataset depends heavily on its neural architecture, motivating considerable efforts to rapidly and accurately identify architectures suited to the target task and dataset. To achieve this, researchers use machine learning models--typically neural architecture encoders--to predict the performance of a neural architecture. Many state-of-the-art encoders aim to capture information flow within a neural architecture, which reflects how information moves through the forward pass and backpropagation, via a specialized model structure. However, due to their complicated structures, these flow-based encoders are significantly slower to process neural architectures compared to simpler encoders, presenting a notable practical challenge. To address this, we propose FGP, a novel pre-training method for neural architecture encoding that trains an encoder to capture the information flow without requiring specialized model structures. FGP trains an encoder to reconstruct a flow surrogate, our proposed representation of the neural architecture's information flow. Our experiments show that FGP boosts encoder performance by up to 106\% in Precision@1\%, compared to the same encoder trained solely with supervised learning.
Structured Bayesian Pruning via Log-Normal Multiplicative Noise
Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g.
e3251075554389fe91d17a794861d47b-Supplemental.pdf
Now,we describe the latencymeasurement pipeline for desktop GPUs, Jetson, serverCPUs, and mobile phone. Furthermore, evenwith the same GPU device, the correlation scores are not high ifthebatch sizes are different. Figure A.1: Visualization of 10 reference neural architectures we used for NAS-Bench-201 search space. Werandomlyselected10reference architectures for each search space (NAS-Bench-201, FBNet, and MobileNetV3) and used them across all experiments and devices of the same search space. In Figure A.1, we visualize 10 reference architectures that we used in NAS-Bench-201 search space.